Adaptive Training for Voice Conversion Based on Eigenvoices
نویسندگان
چکیده
In this paper, we describe a novel model training method for one-to-many eigenvoice conversion (EVC). One-to-many EVC is a technique for converting a specific source speaker’s voice into an arbitrary target speaker’s voice. An eigenvoice Gaussian mixture model (EVGMM) is trained in advance using multiple parallel data sets consisting of utterance-pairs of the source speaker and many pre-stored target speakers. The EV-GMM can be adapted to new target speakers using only a few of their arbitrary utterances by estimating a small number of adaptive parameters. In the adaptation process, several parameters of the EV-GMM to be fixed for different target speakers strongly affect the conversion performance of the adapted model. In order to improve the conversion performance in one-to-many EVC, we propose an adaptive training method of the EV-GMM. In the proposed training method, both the fixed parameters and the adaptive parameters are optimized by maximizing a total likelihood function of the EV-GMMs adapted to individual pre-stored target speakers. We conducted objective and subjective evaluations to demonstrate the effectiveness of the proposed training method. The experimental results show that the proposed adaptive training yields significant quality improvements in the converted speech. key words: speech synthesis, voice conversion, Gaussian mixture model, eigenvoice, adaptive training
منابع مشابه
Doctoral Thesis Techniques for Improving Voice Conversion Based on Eigenvoices
Voice conversion (VC) is a technique for converting a source speaker’s voice into another speaker’s voice without changing linguistic information. As a typical approach to VC, a statistical method based on Gaussian mixture model (GMM) is used widely. A GMM is trained as a conversion model using a parallel data set composed of many utterance-pairs of source and target speakers. Although this fra...
متن کاملطراحی یک روش آموزش ناموازی جدید برای تبدیل گفتار با عملکردی بهتر از آموزش موازی
Introduction: The art of voice mimicking by computers, has with the computer have been one of the most challenging topics of speech processing in recent years. The system of voice conversion has two sides. In one side, the speaker is the source that his or her voice has been changed for mimicking the target speaker’s voice (which is on the other side). Two methods of p...
متن کاملEffects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion
This paper introduces speaker adaptive training techniques to tensor-based arbitrary speaker conversion. In voice conversion studies, realization of conversion from/to an arbitrary speaker’s voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC), which is based on an eigenvoice Gaussian mixture model (EV-GMM), was proposed. Although the EVC can effectively const...
متن کاملMon.O1d.06 Effects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion
This paper introduces speaker adaptive training techniques to tensor-based arbitrary speaker conversion. In voice conversion studies, realization of conversion from/to an arbitrary speaker’s voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC), which is based on an eigenvoice Gaussian mixture model (EV-GMM), was proposed. Although the EVC can effectively const...
متن کاملCross-language voice conversion based on eigenvoices
This paper presents a novel cross-language voice conversion (VC) method based on eigenvoice conversion (EVC). Crosslanguage VC is a technique for converting voice quality between two speakers uttering different languages each other. In general, parallel data consisting of utterance pairs of those two speakers are not available. To deal with this problem, we apply EVC to cross-language VC. First...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEICE Transactions
دوره 93-D شماره
صفحات -
تاریخ انتشار 2010